---
title: Data ingest
description: DataRobot Location AI enables tapping into existing geospatial data sources through a variety of pathways.
---

# Data ingest {: #data-ingest }

DataRobot Location AI enables tapping into existing geospatial data sources through a variety of pathways, including:

* Native geospatial files
* Spatially-enabled database table
* Auto-recognized spatial coordinates
* User transformations to location variable type

Connecting directly to geospatial data saves the time and resources required for exporting from native geospatial data formats in a Geographic Information System (GIS) or a data preparation tool. DataRobot Location AI’s ability to automatically recognize geospatial data in non-native formats also allows non-traditional Geospatial Analysts to work explicitly with spatial data.

##  Native geospatial data {: #native-geospatial-data }

DataRobot Location AI supports ingest of these native geospatial data formats:

* ESRI Shapefiles
* GeoJSON
* ESRI File Geodatabase
* Well Known Text (embedded in table column)
* PostGIS Databases

Native geospatial file formats are uploaded to DataRobot in [the same way](import-to-dr) as non-geospatial formats&mdash;such as drag-and-drop, URL upload, and using the **AI Catalog**.

###  ESRI Shapefiles {: #esri-shapefiles }

<a target="_blank" href="https://en.wikipedia.org/wiki/Shapefile">ESRI Shapefiles</a> are a common native geospatial format, created in the late-1990s and still in wide use today. Shapefiles are a multifile format that require, at a minimum, the `.shp`, `.shx`, and `.dbf` extensions for completion. Because of the multifile nature of the format, DataRobot Location AI accepts ZIP archived files that include these extensions and the additional `.prj` extension describing the <a target="_blank" href="https://en.wikipedia.org/wiki/Spatial_reference_system">Coordinate Reference System (CRS)</a> for the data.

###  GeoJSON {: #geojson }

<a target="_blank" href="https://en.wikipedia.org/wiki/GeoJSON">GeoJSON</a> is a more recent geospatial file format, often used in web mapping applications, and was submitted as a specification by the Internet Engineering Task Force (IETF). Unlike ESRI Shapefiles, GeoJSON is a single file format that describes the <a target="_blank" href="https://en.wikipedia.org/wiki/Spatial_reference_system">Coordinate Reference System (CRS)</a> within the file itself.

###  ESRI File Geodatabase {: #esri-file-geodatabase }

<a target="_blank" href="https://www.loc.gov/preservation/digital/formats/fdd/fdd000294.shtml">ESRI File Geodatabase</a> is a proprietary format that approximates a database through a nested folder structure. Location AI can read a File Geodatabase directory (with extension `.gdb`) in a ZIP archive with extension `.gdb.zip`. Location AI reads the first layer in a Geodatabase file.

###  Well Known Text {: #well-known-text }

<a target="_blank" href="https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry">Well Known Text (WKT)</a> is a markup language described in the Open Geospatial Consortium’s (OGC) <a target="_blank" href="https://www.ogc.org/standards/sfa">Simple Feature Access specification</a>. WKT is a versatile representation of vector geospatial geometries and can be utilized in any of DataRobot AutoML’s existing file types as a feature describing the geometry associated with a row. See the “WKT” column in the figure below.

![](images/lai-2.png)

###  PostGIS Databases {: #postgis-databases }

Configuring PostGIS ingest follows the same workflow as non-geospatial databases.

##  Auto-recognition of location data {: #auto-recognition-of-location-data }

In addition to native geospatial data ingest, DataRobot Location AI can automatically detect location data within non-geospatial formats. DataRobot Location AI will automatically recognize location variables when the columns contain the name **latitude** and **longitude** and contain values in these formats:

* Decimal degrees

* Degrees minutes seconds
	* -46° 37′ 59.988″ and -23° 33′
	* 46.63333W and 23.55S
	* 46\*37′59.98"W and 23\*33′S
	* W 46D 37m 59.988s and S 23D 33m

DataRobot marks geometry features created as the result of auto-recognized spatial coordinates with an icon in the **Data** page.

![](images/lai-3.png)

##  User transformation to location data {: #user-transformation-to-location-data }

When spatial coordinates embedded in non-geospatial file formats are not recognized, you can still use DataRobot [variable type transform](feature-transforms#variable-type-transformations) functionality to create a location feature. To transform data into a location feature:

![](images/lai-4.png)

1. Navigate to one of the parent coordinate features and expand the feature listing; select **Var Type Transform** from the feature menu.

2. In the Numeric/Categorical Transformation dialog, select **Location** from the **Transform Numeric/Categorical** to dropdown.

3. Two additional dropdown menus appear&mdash;**Latitude** and **Longitude**. Select from the existing feature set to specify the parent coordinates.

4. Click **Create feature**.

The new feature appears after its parent feature as a new row in the **Data** table, noted with an icon indicating it is user-created.

![](images/lai-5.png)

##  Location variable type {: #location-variable-type }

In addition to the traditional variable types of numeric, categorical, and date, Location AI adds a location variable type to provide explicit treatment of spatial data in DataRobot models.

![](images/lai-6.png)

The location variable type supports the 2d geometric primitives as specified in the OGC Simple Feature Access specification and some multipart geometries. These include support for:

* Point/MultiPoint
* LineString/MultiLineString
* Polygon/MultiPolygon

Location variables improve DataRobot’s ability to handle location data throughout the AutoML workflow, including model blueprints, feature importance calculations, and visualizations.
